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Abstract 

Approximate Bayesian computation has emerged as a standard computational tool 
when dealing with the increasingly common scenario of completely intractable like- 
lihood functions in Bayesian inference. We show that many common Markov chain 
Monte Carlo kernels used to facilitate inference in this setting can fail to be variance 
bounding, and hence geometrically ergodic, which can have consequences for the re- 
liability of estimates in practice. We then prove that a recently introduced Markov 
kernel in this setting can be variance bounding and geometrically ergodic whenever 
its intractable Metropolis-Hastings counterpart is, under reasonably weak and man- 
ageable conditions. We indicate that the computational cost of the latter kernel is 
bounded whenever the prior is proper, and present indicative results on an example 
where spectral gaps and asymptotic variances can be computed. 

Keywords: Approximate Bayesian computation; Markov chain Monte Carlo; Variance 
bounding; Geometric ergodicity; Local adaptation. 



1 Introduction 

Approximate Bayesian computation refers to branch of Monte Carlo methodology that uti- 
lizes the ability to simulate data according to a parametrized likelihood function in lieu 
of computation of that likelihood to perform approximate, parametric Bayesian inference. 
These methods have been used in an increasingly diverse range of applications since their 
inception in the context of population genetics (Tavarc et al., 1997; Pritchard et al., 1999), 
particularly in cases where the likelihood function is either impossible or computationally 
prohibitive to evaluate. 

We are in a standard Bayesian setting with data y £ Y, a parameter space 0, a prior 
p : 9 — s- M.+ and for each 9 £ O a likelihood fg : Y — s- W.+ . We assume Y is a metric space 
and consider the artificial likelihood 

fe(y) ^V(e)- 1 J l Bc(x) (y)fg(x)dx = V(e)- 1 f e (B £ (y)), U) 

which is commonly employed in approximate Bayesian computation. Here, B r (z) denotes 
a metric ball of radius r around z, V(r) := J Y lB T (o)(x)dx denotes the volume of a ball of 



radius r in Y and 1 is the indicator function. We adopt a slight abuse of notation by referring 
to densities as distributions, and where convenient, employ the measure-theoretic notation 
fi(A) — f A n(dX). We consider situations in which both e and y are fixed, and so define 
functions h : 9 — !• [0, 1] and w : Y -> [0, 1] by h{6) := fg(B e (y)) and w(x) := ls e ( a )(y) to 
simplify the presentation. The value h(8) can be interpreted as the probability of "hitting" 
B e (y) with a sample drawn from fg. 

While this likelihood is also intractable in general, the "approximate" posterior it induces 
ir(8) :— h(8)p(8)/ J e /i(i9)p(i9)d# can be dealt with using constrained versions of standard 
methods (see, e.g., Marin et al., 2012) when sampling from fg is possible for any e Q. 
We are often interested in computing n((f) :— J Lp(9)ir(8)d8, the posterior expectation of 
some function tp, and it is this type of quantity that can be approximated using Monte Carlo 
methodology. We focus on one such method, Markov chain Monte Carlo, whereby a Markov 
chain is constructed by sampling iteratively from an irreducible Markov kernel P with unique 
stationary distribution tt. We can use such a chain directly to estimate ~n(ip) using partial 
sums, i.e. given the realization 8i, 82, ■ ■ ■ of a chain started at 8q, where 81 ~ P(#j_i, •) for 
ieNwe compute 



for some m. Alternatively, the Markov kernels can be used within other methods such as 
sequential Monte Carlo (Del Moral et al., 2006). In the former case, it is desirable that a 
central limit theorem holds for (2) and further that the asymptotic variance, a 2 s (P, (p), of 
(2) be reasonably small, while in the latter it is desirable that the kernel be geometrically 
ergodic, i.e. P m (8o, ■) converges at a geometric rate in m to ir in total variation where P m 
is the m-fold iterate of P (see, e.g., Roberts & Rosenthal, 2004; Meyn & Tweedie, 2009), 
at least because this property is often assumed in analyses (see, e.g., Jasra & Doucet, 2008; 
Whiteley, 2012). In addition, consistent estimation of a 2 s {P,ip) is well established (Hobert 
et al., 2002; Jones et al, 2006; Bednorz & Latuszynski, 2007; Flegal & Jones, 2010) for 
geometrically ergodic chains. 

Motivated by these considerations, we study both the variance bounding (Roberts & Rosen- 
thal, 2008) and geometric ergodicity properties of a number of reversible kernels used for 
approximate Bayesian computation. For reversible P, a central limit theorem holds for all 
(f € L 2 (tt) if and only if P is variance bounding (Roberts & Rosenthal, 2008, Theorem 7), 
where L 2 (ir) is the space of square- integrable functions with respect to ir. 

Much of the literature to date has sought to control the trade-off associated with the quality 
of approximation (1), controlled by e and manipulation of y, and counteracting computa- 
tional difficulties (see, e.g., Fearnhead & Prangle, 2012). We address here a separate issue, 
namely that many Markov kernels used in this context are neither variance bounding nor ge- 
ometrically ergodic, for any finite e in rather general situations when using "local" proposal 
distributions. 

As a partial remedy to the problems identified by this negative result, we also show that 
under reasonably mild conditions, a kernel proposed in Lee et al. (2012) can inherit variance 
bounding and geometric ergodicity from its intractable Metropolis-Hastings (Metropolis 
et al., 1953; Hastings, 1970) counterpart. This allows for the specification of a broad class of 
models for which we can be assured this particular kernel will be geometrically ergodic. In 
addition, conditions ensuring inheritance of either property can be met without knowledge 
of /g, e.g. by using a symmetric proposal and a prior that is continuous, everywhere positive 
and has exponential or heavier tails. 

To assist in the interpretation of results and the quantitative example in the discussion, we 
provide some background on the spectral properties of variance bounding and geometrically 
ergodic Markov kernels. Both variance bounding and geometric ergodicity of a Markov 




(2) 



i=l 
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kernel P are related to S(P), the spectrum of P considered as an operator on the restric- 
tion of L 2 {tt) to zero-mean functions (see, e.g., Mira & Geyer, 1999; Mira, 2001). Variance 
bounding is equivalent to sup S(P) < 1 (Roberts & Rosenthal, 2008, Theorem 14) and geo- 
metric ergodicity is equivalent to sup ^(P)! < 1 (Roberts & Rosenthal, 1997, Theorem 2.1). 
The spectral gap 1 — sup|5(P)| of a geometrically ergodic kernel is closely related to its 
aforementioned geometric rate of convergence to 7r, with faster rates associated with larger 
spectral gaps. In particular, its convergence in total variation satisfies for some C : — > K + 
and p < sup|S(P)|, 

M-)-P m (6 ,-)\\TV <C(6 Q )p m . (3) 



2 The Markov kernels 

In this section we describe the algorithmic specification of the 7r-invariant Markov kernels 
under study. The algorithms specify how to sample from each kernel; in each, a candidate t? 
is proposed according to a common proposal q{6, •) and accepted or rejected, possibly along 
with other auxiliary variables, using simulations from the likelihoods and fg. We assume 
that for all 8 £ 6, q(9, •) and p are densities with respect to a common dominating measure, 
e.g. the Lebesgue or counting measures. 

The first and most basic Markov kernel in this setting was proposed in Marjoram et al. 
(2003), and is a special case of a "pseudo-marginal" kernel (Beaumont, 2003; Andrieu & 
Roberts, 2009). Such kernels have been used in the context of approximate Bayesian com- 
putation in Becquet & Przeworski (2007) and Del Moral et al. (2012) and evolve on 9 x 
by additionally sampling auxiliary variables zi : jv ~ ff for a fixed N £ N. We denote 
kernels of this type for any N by Pi, at, and describe their simulation in Algorithm 1. 



Algorithm 1 To sample from Pi,n{{9,Xi-.n), ■) 



1. Sample - q(9, •) and z 1:N ~ ff N . 

2. With probability 1 A 2(1] ) ' out P ut (^ z i:Jv)- Otherwise, output (6,x UN ). 



In Lee et al. (2012), two alternative kernels were proposed in this context, both of which 
evolve on 9. One, denoted P2,n and described in Algorithm 2, is an alternative pseudo- 
marginal kernel that in addition to sampling z 1: jv ~ ff N , also samples auxiliary vari- 
ables Xi-.N—i ~ f® N ~ l - Detailed balance can be verified directly upon interpreting S z :— 
Y^j = i w { z j) ancl &x '■— Sjl^. 1 w ( x j) as Binomial(iV, h('d)) and Binomial(iV — l,h(6)) ran- 
dom variables respectively. The other kernel, denoted P3 and described in Algorithm 3, also 
involves sampling according to fg and /# but does not sample a fixed number of auxiliary 
variables. This kernel also satisfies detailed balance (Lee, 2012, Proposition 1). 



Algorithm 2 To sample from P2,n{9> ■) 

1. Sample d ~ q(6, ■), x 1:JV -i ~ ff N ^ and z 1:N ~ ff N . 

2. With probability 1 A , ^ ^ ^ 'P'^n-'i^^ i output 1?. Otherwise, output 6. 

y J p(«)g(e,«)[i+E, = i <»(*))]' 



Because many of our positive results for P3 are in relation to Pmh, the Metropolis-Hastings 
kernel with proposal q, we provide the algorithmic specification of sampling from Pmh in 
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Algorithm 3 To sample from P%{6, •) 



1. Sample •& ~ q(9,-). 

2. With probability 1 - (l A ^g^g ffi ) , stop and output 0. 

3. For £ = 1,2,... until w ( z j) + w ( x j) ^ 1) sample a;^ ~ /g and ~ Set N <— i. 

4. If u>(z/v) = 1, output i?. Otherwise, output 0. 



Algorithm 4. We note that in the approximate Bayesian computation setting use of Pmh is 
ruled out by assumption since h cannot be computed and that the preceding kernels are, in 
some sense, "exact approximations" of Pmh- 

Algorithm 4 To sample from Pmh(#, ■) 

1. Sampled- q{9, ■). 

2. With probability (l A |^fyf^Jy) , output tf. Otherwise, output 9. 



The kernels share a similar structure, and P2 n> P3 and Pmh can each be written as 



P(0, dtf) = q(9, dti)a(6, 0) + 1 - / q(0, d0>(0, 9') 6 B {d0), (4) 

where only the function a(9, d) differs. Pi } n can be represented similarly, with modifications 
to account for its evolution on the extended space 9 x Y . The representation (4) is used 
extensively in our analysis, and we have for P2,n, P3 and -Pmh, respectively 

where c(9,i)) := p(9)q(9,{>) and (6) is obtained, e.g., in (Lee, 2012). Finally, we reiterate 
that all the kernels satisfy detailed balance and are therefore reversible. 



3 Theoretical properties 

We assume that O is a metric space, and that H := J p(9)h(9)d9 satisfies H e (0, 00) so tt 
is well defined. We allow p to be improper, i.e. for J p(9)d9 to be unbounded, but otherwise 
assume p is normalized, i.e. J p(0)d0 — 1. We define the collection of local proposals to be 

Q := {q : V<5 > 0, 3r € (0, 00), V0 € 9, q(9, B L r {9)) < S} , (8) 

which encompasses a broad number of common choices in practice. We denote by V and Q 
the collections of variance bounding and geometrically ergodic kernels, respectively, noting 
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that Q C V. In our analysis, we make use of the following conditions. 



(CI) 



(i) q G Q. 

(ii) Vr > 0, 7r(B=(0)) > 0. (hi) V<5 > 0, 3t> > , sup h{9) < 5. 



(C2) 



(i) 9 G Q. 

(ii) V-K" > 0, 3M K e (0, oo), V(6», i))e6x B K (9) : n(9)q(9, 0) A 7r(i%(#, 0) > 0, 




(C3) 



3M e (0, oo), V(0, i?) e e 2 : tt(0)<?(0, i?) A 7r(0)<?(0, 0) > 0, 





Condition (CI) ensures that the posterior has mass arbitrarily far from but that h(9) gets 
arbitrarily small as we move away from some compact set in 0, while (C2) is a condition on 
the interplay between the likelihood and the prior-proposal pair. For example, it is satisfied 
for symmetric q when p is continuous, everywhere positive with exponential or heavier tails, 
or alternatively, if the likelihood is continuous, everywhere positive and decays at most 
exponentially fast. It should be clear that (CI) and (C2) are not mutually exclusive. (C3) 
is a global variant of (C2), included for completeness and all results proven under (C2) also 
hold under (C3) with simplified proofs that are omitted. 

We first provide a general theorem that supplements Roberts & Tweedie (1996, Theorem 5.1) 
for reversible kernels, indicating that lack of geometric ergodicity due to arbitrarily "sticky" 
states coincides with lack of variance bounding. All proofs are housed in Appendix A. 

Theorem 1. For any v not concentrated at a single point and any reversible, irreducible, v- 
invariant Markov kernel P , such that P(0, {9}) is a measurable function, iff— esssup e P(0, {9}) = 
1 then P is not variance bounding. 

Our first result concerning the kernels under study is negative, and indicates that perfor- 
mance of Pi^n and P 2 ,n under (CI) can be poor, irrespective of the value of N. 

Theorem 2. Assume (CI). For all N e N, Pi,jv i V and P 2 ,n £ V. 

Remark 1. Theorem 2 immediately implies that under (CI), Pi,jv ^ Q an d Pi,n £ G by 
Roberts & Rosenthal (2008, Theorem 1). That Px,JV ^ Q under (CI) is not covered by 
Andricu & Roberts (2009, Theorem 8) or Andrieu & Vihola (2012, Propositions 9 or 12), 
since what they term weights in this context, w(x)/h(9), are upper bounded by h(9)~ x for 
7r — a. a. and fg — a. a. x but are not bounded uniformly in 9. 

We emphasize that the choice of q is crucial to establishing Theorem 2. Since H > 0, if 
g(#,$) = <7(#), e.g., and sup g p(9)/g(9) < oo then by Mcngcrscn & Tweedie (1996, Theo- 
rem 2.1), Pi,jv is uniformly ergodic and hence in Q. Uniform ergodicity, however, does little 
to motivate the use of an independent proposal in challenging scenarios, particularly when 
is high dimensional. 

Our next three results concern P 3 , and demonstrate first that variance bounding of Pmh is 
a necessary condition for variance bounding of P3, and further that Pmh is at least as good 
as P3 in terms of the asymptotic variance of estimates such as (2). More importantly, and 
in contrast to P\ t N and P 2 ,n, the kernel P3 can systematically inherit variance bounding 
and geometric ergodicity from Pmh under (C2) or (C3). 

Proposition 1. P 3 e V P M h G V and ct 2 s (P M h, <p) < ^(Pa, (f). 
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Theorem 3. Assume (C2) or (C3). P MH € V => P 3 £ V. 
Theorem 4. Assume (C2) or (C3). P M h eG ^ P 3 £G- 

Remark 2. The statements in Proposition 1 and Theorems 3 and 4 are precise in the following 
sense. There exist models for which P 3 £ V \ Q and Pmh £ V \ Q and there exist models 
for which P 3 £ G and P MH 6V\6, i.e. under (C2) or (C3), P M h £ V =£> P 3 £ Q and 
P3 £ G Pmh G G- Example 2 illustrates these possibilities. 

Remark 3. The conditions (C2) and (C3) are not tight, but counterexamples can be con- 
structed to show that extra conditions are necessary for the results to hold. The conditions 
allow us to ensure that aMn(9, $) and a 3 {9, •&) differ only in a controlled manner, for all 9 and 
"enough" and hence that Pmh and P 3 are not too different. As an example of the possible 
differences between Pmh and P 3 more generally, consider the case where p{9) = p{8)/ip{9) 
and h{9) = h(9)ip(6) for some ip : — > (0,1]. Then properties of Pmh depend only on p and 
h whilst those of P 3 can additionally be dramatically altered by the choice of ip. 

Theorem 4 can be used to provide sufficient conditions for P 3 £ Q through Pmh G G and 
(C2). The regular contour condition obtained in Jarner & Hansen (2000, Theorem 4.3), e.g., 
implies the following corollary. 

Corollary 1. Assume (a) h decays super- exponentially and p has exponential or heavier 
tails, or (b) p has super- exponential tails and h decays exponentially or slower. If, moreover, 
7r is continuous and everywhere positive, q is symmetric satisfying q(9, i?) > e q whenever 
\9 — i?| < S q , for some e p , S q > 0, and 

Vtt(0) 

hmsup — • ; , < 0, (9) 
IflKoo \0\ |Vtt(0)| 

where ■ denotes the Euclidean scalar product, then P 3 £ G ■ 

The proofs of Theorems 3 and 4 can also be extended to cover the case where Pmh is a finite, 
countable or continuous mixture of Pmh kernels associated with a collection of proposals 
{g s } se s and P 3 is the corresponding mixture of P 3 kernels. With a modification of (C2) 
and (C3), the following proposition is stated without proof, and could be used, e.g., in 
conjunction with Fort et al. (2003, Theorem 3). 



(C4) > 0, 3M K £ (0, 00), Vft £ {q s }ses, and 

(i) V(0, 0) £ 6 x B K {9) : TT(9)q t (9, 0) A n(d)q t (#, 9) > 0, if q t £ Q, or 

(ii) V{8, d)£Q 2 : n(9)q t (9, 0) A 7r(0)g t (i?, 9) > 0, if q t £ Q, 

either ^e[M K \M K ] or g [M^,M K ]. 

Proposition 2. Let P\i}i(9,d'd) = J s //(ds)P M s ^(0, dfl), where pi is a mixing distribution 

(s) 

on S and each P^^ is a it -invariant Metropolis -Hastings kernel with proposal q s . Let 
P 3 {8,Ad) = J s fi(ds)P^ s) (9,d^) be defined analogously. Then 

1. P 3 £ V => P MH G V and ^(P MH , <p) < ^(P 3 , v)- 

2. Assume (C4). P MH G V =>> P 3 G V. 

3. Assume (C4). P MH G G P3 G 







While the sampling of a random number of auxiliary variables in the implementation of P3 
appears to be helpful in inheriting qualitative properties of Pmh, one may be concerned 
that the computational effort associated with the kernel can be unbounded. Our final result 
indicates that this is not the case whenever p is proper. 

Proposition 3. Let (iVj) be the sequence of random variables associated with step 3 of 
Algorithm 3 if one iterates P3, with Nj = if at iteration j the kernel outputs at step 2. 
Then if J p(9)d9 = 1. H > 0, and P 3 is irreducible, 



m 

n := lim m -1 A 7 , < H^ 1 < 00. 

m— >oo ^ — ' 

i=l 



H is a natural quantity when p is proper; if ur is the expected number of proposals to obtain 
a sample from ir using the rejection sampler of Pritchard et al. (1999) we have ur = 1/H, 
and if we construct Pi, at with proposal q(8, z?) = p(i}) then H lower bounds its spectral gap. 
In fact, n can be arbitrarily smaller than ur, as we illustrate via the following examples. 

Example 1.6 = Z+, p{6) = 1 N (0)(1 - a)a e ~ 1 and h{6) = b e for (a, b) G (0, l) 2 . 



In Example 1, 7r is a geometric distribution with success parameter 1 — ab and geometric 
series manipulations provided in Appendix B give ur = (1 — ab)/(b(\ — a)). If q{9,$) = 
l{e-i,e+i} W x V 2 , we nave 

f (a + 6) 1 ^(l^jai^ 1 



6(l-a)(l + 6) J " - 2 [6(1 -a 

and so n^/n > 2/(o(l + 6)), which grows without bound as a — > 0. Regarding the propriety 
condition on p, we observe that hr — > 00 and n — > 00 as a — >■ 1 with 6 fixed. 

Example 2. 6 = E+, p{6) = l [0tC] (6)/a and h(6) = bl [0A] (9) for (a, 6) e [l,oo) x (0,1]. 



In Example 2, H 1 = a/b and n < b 1 for any g so ur/u > a. We observe that even 
if p is improper, n is finite. Regarding Remark 2, for any a > 1, consider the proposal 
= 2 x l [0 ,i/ 2 ] (0)1(1/2,1] (#) + 2 x l (1/ 2,i](0)l[o,i/2] W- If 6 = 1, then P 3 e V \ 5 and 
Pmh G V \ £. However, if 6 G (0, 1) then P 3 e G and P MH € V \ £. 



4 Discussion 



Our analysis suggests that P3 may be geometrically ergodic and/or variance bounding in 
a wide variety of situations where kernels P\ t N and P^at are not. In practice, (C2) can 
be verified and used to inform prior and proposal choice to ensure that P3 systematically 
inherits these properties from Pmh- Of course, variance bounding or geometric ergodicity of 
Pmh is typically impossible to verify in the approximate Bayesian computation setting due 
to the unknown nature of /<?. However, a prior with regular contours as per (9) will ensure 
that Pmh is geometrically ergodic if fg decays super-exponentially and also has regular 
contours. In addition, (C2) and (C3) are stronger than necessary but tighter conditions are 
likely to be complicated and may require case- by-case treatment. 

To supplement the qualitative results regarding geometric ergodicity of the kernels, we in- 
vestigated a modification of Example 1 with a finite number of states. More specifically, 
we considered the case where the prior is truncated to the set {1, . . . , D} for some D E N. 
In this context, we can calculate explicit transition probabilities and hence spectral gaps 
1 — |5(P)| and asymptotic variances cr 2 s (P, <p) of (2) for Pj^, P3 and Pmh- Figure 1 shows 
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the log spectral gaps for a range of values of D for each kernel and b € {0.1, 0.5, 0.9}. We can 
see the spectral gaps of P3 and Pmh stabilize, whilst those of Pz,n decrease exponentially 
fast in D, albeit with some improvement for larger N. The spectral gaps obtained, with (3), 
suggest that the convergence of P2,n to it can be extremely slow for some #0 even when D 
is relatively small. Indeed, in this finite, discrete setting with reversible P, the bounds 

1 (max|S(P)|) m < max||7r(0 - P m (0 O r)hv < I (max \S(P)\) m ' 1 inh "' " ' " 



2 V 1 v ~ e 11 w vuwmxv_ 2 \ , v ,„ y mme7 r((9) 

hold (Montenegro & Tetali, 2006, Section 2 and Theorem 5.9), which clearly indicate that 
P"2,N can converge exceedingly slowly when P3 and Pmh converge reasonably quickly. The 
value of n in these cases stabilized at 4.77, 0.847 and 0.502 respectively, within the bounds 
of (10), and considerably smaller than 100. 




(a) 6 = 0.1 (b)6 = 0.5 (c)6 = 0.9 

Figure 1: Plot of the log spectral gap against D for P 2i i (dot-dashed), p2,ioo (dotted), P3 
(dashed) and Pmh (solid), with a = 0.5. 

Figures 2 and 3 show log &% S (P, <p) against D for ipi(0) — 9 and ip2(0) = (ab)~ 9 / 21 , respec- 
tively, which were computed using the expression of Kcmeny & Snell (1969, p. 84). The 
choice of if>% is motivated by the fact when p is not truncated, ip{9) — [ab)^ e ^ 2+8S> is in 
L 2 (ir) if and only if 6 > 0. While cr^ s (P, ipi) is stable for all the kernels, cr 2 s (P, <p 2 ) increases 
rapidly with D for P 2j i and p2,ioo- We also note that while <t? is (P2,n, Vi) can be lower than 
a2 s (P3,cp\), the former requires many more simulations from the likelihood. In fact, while 
the results we have obtained pertain to qualitative properties of the Markov kernels, this 
example illustrates that P3 can significantly outperform p2,ioo for estimating even the more 
well-behaved n(ipi), when cost per iteration of each kernel is taken into account. 

Variance bounding and geometric ergodicity are likely to coincide in most applications of 
interest, as variance bounding but non-geometrically ergodic Metropolis-Hastings kernels 
exhibit periodic behaviour rarely encountered in statistical inference. We also note that 
bounds on the second largest eigenvalue and/or spectral gap of P3 in relation to properties 
of Pmh are also possible through Cheeger-like inequalities using conductance arguments as 
in the proofs of Theorems 3 and 4, although these may be quite loose in some situations 
(see, e.g., Diaconis & Stroock, 1991) and we have not pursued them here. Finally, Roberts 
& Rosenthal (2011) have demonstrated that some simple Markov chains that are not geo- 
metrically ergodic can converge extremely slowly and that properties of such algorithms can 
be very sensitive to even slight parameter changes. 
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r 



(b) b = 0.1 



(c) b = 0.5 



(d) b = 0.9 



Figure 2: Plot of log al s (P, <pi) against D for P = P 2 ,i (dot-dashed), P = p2,ioo (dotted), 
P = P3 (dashed) and P = Pmh (solid), with a = 0.5. 



(b) 



: 0.1 



(c) b = 0.5 



(d) 6 = 0.9 



Figure 3: Plot of log a^ s (P, <p%) against D for P = P 2 ,i (dot-dashed), P = i^.ioo (dotted), 
P = P3 (dashed) and P = Pmh (solid), with a = 0.5. 
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A Proofs 



Many of our proofs make use of the relationship between conductance, the spectrum of 
a Markov kernel, and variance bounding for reversible Markov kernels P. In particular, 
conductance k > is equivalent to sup S(P) < 1 (Lawler & Sokal, 1988, Theorem 2.1), 
which as stated earlier is equivalent to variance bounding. Conductance k for a 7r-invariant, 
transition kernel P on is defined as 



k := 



inf 



k(A), k{A) := niA)- 1 / P(9, A c )Tr(d9) 



P(0,yL C )7T A (d0), 



where ^(dfl) := n(d9)lA(8) /tt(A) . From (8), we define for any q € Q the function 

r q {5) := inf {r:\f9e 9, q{9, B c r {9)) < 5} . 
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Proof of Theorem 1. If v — ess sup e P(0, {0}) — 1 and P(8,{9}) is measurable, then the 
set A T — {9 G 6 : P(9,{8}) > 1 — r} is measurable and v(A T ) > for every r > 0. 
Moreover, ao = lim T ^o ^(^4t) exists, since A T2 c A T1 for T2 < ri. Now, assume ao > 0, and 
define A = {9 e : P(0, {0}) = 0} = f] n A Tn where r„ \ 0. By continuity from above 
v(Aq) — ao > and since v is not concentrated at a single, point P is reducible, which is 
a contradiction. Hence ao = 0. Consequently, by taking t„ \ with Ti small enough, we 
have u(A Tn ) < 1/2 for every n, and can upper bound the conductance of P by 

k < lim k(A t ) = lim / P(0,A£ )^ (d0) < lim / P(6», {6»} C )^ A (d0) = lim r„ = 0. 

Therefore P ^ V. □ 

Proof of Theorem 2. We prove the result for P2,n- The proof for P^at is essentially identical, 
and is omitted. By Theorem 1, it suffices to show that tt — esssupg P(9, {9}) = 1, i.e. 

Vt > 0, 3A{t) C 6 with tt(A(t)) > s.t. V0 <= A(r), P 2 ,at(0, 6 \ {0}) < r. 

By (CI) g e Q. Given r > 0, let r(r) = r g (r/2), «(t) = inf : Sup eeB o (0) h{6) < 1 - [1 - t/2} 1 ^ 
and A{t) = B c v(T)+r[T] {Q). tt(A(t)) > and using (4) and (5) under (CI), V0 € A(t), 

p 2)J v(0,e\{0})= III (ia '^' r ff l/^-Hd^^-O/rCd^^d^ 

< sup q(9,B c r{T) (8))+ I I l {1 ,..., N} (S z )ff N (dz 1:N )q(6,d#) 

flee K ' JB r(T) {6) Jy n 




1- sup |g(0,di?)<T. 



□ 



The following two Lemmas are pivotal in the proofs of Proposition 1 and Theorems 3 and 4, 
and make extensive use of (4), (6) and (7) 

Lemma 1. P 3 (0,{0}) > P M h(0,{0}). 

Proof. We show that for any (0, z9), a 3 (0, 1?) < Q?mh(0; Consider the case c(-0, 0) < c(0, 
Then since h{9) < 1, 

aa(M) = c{0,4)hW + h(e)-hV)h{0) ~ c(9,mo) = aMH( ^ } - 

Similarly, if c(i?, 0) > c(0, we have 

This immediately implies P 3 (0, {0}) > P M h(0, {0}) since P(0, {0}) = l-/ e v w q(0, 0)a(0, d)dti 

□ 

Lemma 2. Assume (CU). For n - a.a. 9 and any iC6 such that A H {0} 7^ and r > 0, 
^mh(0,^ c ) < su P a(0,P r c (0)) + (1 + M r )P 3 (0, A c ). 
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Proof. We begin by showing that 

a M a(6,4)<0- + M r )a a (6,<d>). (11) 

First we deal with the case h($)p(-d)q(i}, 9) = 0. Then the inequality is trivially satisfied 
as a M H(M) = c*3(M) = 0- Conversely, if 7r(0)g(0,i?) > and 7r(<%(tf, 0) > and 
additionally e £ r (0), then under (C2), 



(l + M r )c(^ W ) = (1 + Mr){c( , )W)Vc( ^ W)} 



> {c(8, ti)h(0) V c(0, + {c(0, V c(0, 6»)/j,(6')} 

> {(c(0, #)h(0) + c(0, #)h(#)) V (c(0, 0)fc(0) + c(0, 6»)/i(6»))} 

c(??,0)/i(tf) _ c(ti,6)h(d) 



> 



i c(e,«)h(e)+c(e,»>)fe(iJ) c (i?,e)h(i5)+c(i5,e)/ l (e) J" k(<9)+h(e) ic(e,0) J 
c(tf,0)/i(tf) 



i.e. qmh(M) < (1 + Mr)a3(0, i9). Hence, we have 

Pmh(0,A c )= / a MH (M)g(0,d7?) <g(0,£°(0)) + / (1 + M r )a 3 (0, dtf) 

< supg(0,P r c (0)) + (l + M r )P 3 {9,A c ). 
e 



□ 



Proof of Proposition 1. Lemma 1 gives P 3 -< Pmh in the sense of (Pcskun, 1973; Tierney, 
1998) and so al s (P 3l tp) < al s {P MU ,ip). By Roberts & Rosenthal (2008, Theorem 8), P 3 ^ 
Pmh => (P 3 € V P MH G V). □ 



Proof of Theorem 3. We prove the result under (C2). Let A be a measurable set with 
ir(A) > 0. Since q € Q wc let P = r 9 (KMH/2) and Mr be as in (C2). Then by Lemma 2 we 
have 



«mh(A) = / P MH (0,A c )7r A (d0) < ^H+(1 + M R ) / P 3 (e,A c )7r A (d0) 
+ (1 + M r )k 3 (A). 



I A 



2 

Since A is arbitrary, we conclude that kmh < 2(1 + Mr)k 3 so kmh > K3 > 0. □ 



Proof of Theorem 4- Recall that geometric ergodicity is equivalent to sup |<S(P)| < 1. From 
the spectral mapping theorem (Conway, 1990) this is equivalent to sup S^P 2 ) < 1, where 
S(P 2 ) is the spectrum of P 2 , the two-fold iterate of P. We denote by /C3 and K^L the 

conductance of Pf and P^ H respectively. We prove the result under (C2). Since q £ Q we 

(2) 

let R — ^(^mh/^) an d Mr be as in (C2). By Lemmas 1 and 2, we have for any measurable 
ACQ 

Pmh(0,A) = P MH (9,A\{9}) + l A ({9})P MH (9,{e}) 

< «Sh/4 + (1 + Mr)P 3 (9, A \ {9}) + P 3 (0, {9}) 
<^/4+(l + M fl )P 3 (0,A). 
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We can also upper bound, for any 9 £ @, the Radon-Nikodym derivative of Pmh(9, ■) with 
respect to P 3 (9, •) for any •& e Br{9) as 

= lB -W\W W d^ W a 3 (M) + lwW WW) 
< ^(wW^ + ^ + ^jW <l + M R , 

where we have used (11) and Lemma 1 in the first inequality. Let A be a measurable set 
with tt(A) > 0. We have 



k mh( a ) = J (Y ^mhOM^^WM^)) n A {d9) 



< [ P M H^,A c )P M}i (9,M)+ P M u^,A c )P Mii (e,M)\n A (de) 
J A \JB R (e) JB R (e) J 

< [ [q(9,B R (9)) + [ P M H^,A c )P M}i (9 1 M))Tr A (d9) 

J A V JBr{6) J 



<«mh/4+/ / PMH(ti,A c )P M K(9,dtf)ir A (d9) 

J A J B R {8) 

<«mh/ 4 +/ / {K$ n /4+(l + M R )P 3 (ti,A c )}PMn(0,d#)ir A (d9) 



A JB R (8) 

<k%/2 + (1 + M r ) f f P 3 (ti,A c )P Mli (9,d$)Tr A (d9) 

J A JB R {9) 



dP MH (M 
dP 3 (9,-) 

< K ^/2 + (l+M fl ) 2 / / P 3 (#,A c )P 3 (6,d#)7r A (d6) 



«^/2 + (l + M fl ) / / P 3 0M c ) ^ ■ • J WP 3 (g,d7?)7r A (dg) 
JaJb r (9) aP3\V,-) 



AJB R (6) 

<^/2+(l + M fl ) 2 / / P 3 (^A c )P 3 (^d^(d0) 

= / t W/2 + (l + M„) 2 «W(A). 

Since A is arbitrary, we conclude that k^jjj < 2(1 + M fl ) 2 4 2) so k^j > => 4 2) > °- n 

Proof of Proposition 3. If the current state of the Markov chain is 9, the expected value of 
N is 

since upon drawing $ ~ q(9, ■), N = with probability 1 — {1 A c(i?, 9)} and with probability 
{1 A c($, 6*)} it is the minimum of two geometric random variables with success probabilities 
h(9) and h(fl), i.e. it is a geometric random variable with success probability h(9) + h(-d) — 
h{9)h{ r d). Since P 3 if 7r-invariant and irreducible, the strong law of large numbers for Markov 
chains implies 

where we have used J e p(6)dd = 1 in the first inequality. □ 
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B Calculations for Example 1 



To obtain ur = H 1 calculate 



8 6(1 " a) 



H = (1 - a) £ a*" V = 6(1 - ^(aif = ^ 



fl-a&V 

9=1 0=0 v ' 



so n/i = (1 — a6)/(6(l — a)). To bound n, we have 



(l-a6)(f>6)*-^ 

U=i 



-1 _ b 28-l + b 8 + b 8+l _ b 28+l^)j 0- ab ) 2 

1 — ab J e-1 / 1 a/6 



2 I f-f + l + 6-6 e+1 

and so both 

1 — a6 [ o 1 / a\ 1 — ab f a + 6 



1 — ab J ^ >^ e _ x ^ 1 + a/6^ [ 1 — a6 J a + 6 



and 

l - II" I v — ■> 



=i 



1 + 6 7 2 [6(l-a)(l + 
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